Reinforcement Learning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies with PAC Bounds
نویسندگان
چکیده
Perkins’ Monte Carlo exploring starts for partially observable Markov decision processes (MCES-P) integrates Monte Carlo exploring starts into a local search of policy space to offer a template for reinforcement learning that operates under partial observability of the state. In this paper, we generalize the reinforcement learning under partial observability to the self-interested multiagent setting. We present a new template, MCES-IP, which extends MCES-P by maintaining predictions of the other agent’s actions based on dynamic beliefs over models. MCES-IP is instantiated to be approximately locally optimal with some probability by deriving a theoretical bound on the sample size that in part depends on the allowed error from the sampling; we refer to this algorithm as MCESIP+PAC. Our experiments demonstrate that MCESIP+PAC learns policies whose values are comparable or better than those from MCESP+PAC in multiagent domains while utilizing much less samples for each transformation.
منابع مشابه
Learning to Act Optimally in Partially Observable Multiagent Settings: (Doctoral Consortium)
My research is focused on modeling optimal decision making in partially observable multiagent environments. I began with an investigation into the cognitive biases that induce subnormative behavior in humans playing games online in multiagent settings, leveraging well-known computational psychology approaches in modeling humans playing a strategic, sequential game. My subsequent work was in a s...
متن کاملLearning Others' Intentional Models in Multi-Agent Settings Using Interactive POMDPs
Interactive partially observable Markov decision processes (I-POMDPs) provide a principled framework for planning and acting in a partially observable, stochastic and multiagent environment, extending POMDPs to multi-agent settings by including models of other agents in the state space and forming a hierarchical belief structure. In order to predict other agents’ actions using I-POMDP, we propo...
متن کاملA reinforcement learning scheme for a multi-agent card game with Monte Carlo state estimation
This article presents the state estimation method based on Monte Carlo sampling in a partially observable situation. We formulate an automatic strategy acquisition problem for the multi-agent card game “Hearts” as a reinforcement learning (RL) problem. Since there are often a lot of unobservable cards in this game, RL is dealt with in the framework of a partially observable Markov decision proc...
متن کاملPruning for Monte Carlo Distributed Reinforcement Learning in Decentralized POMDPs
Decentralized partially observable Markov decision processes (Dec-POMDPs) offer a powerful modeling technique for realistic multi-agent coordination problems under uncertainty. Prevalent solution techniques are centralized and assume prior knowledge of the model. Recently a Monte Carlo based distributed reinforcement learning approach was proposed, where agents take turns to learn best response...
متن کاملA Unified Game-Theoretic Approach to Multiagent Reinforcement Learning
To achieve general intelligence, agents must learn how to interact with others in a shared environment: this is the challenge of multiagent reinforcement learning (MARL). The simplest form is independent reinforcement learning (InRL), where each agent treats its experience as part of its (non-stationary) environment. In this paper, we first observe that policies learned using InRL can overfit t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016